AITopics | decision point

Frontier reasoning models are produced by posttraining base language models with reinforcement learning. Recent work has challenged this by showing that sampling from a sharpened version of the base model's distribution, a so-called power distribution, elicits comparable reasoning without additional training, curated datasets, or verifiers. However, making this method practical requires efficiently sampling from the power distribution. A sampler needs to "mix" to the power distribution, which necessitates moving between modes of the target distribution; intuitively, e.g., trying different reasoning strategies. The samplers proposed in prior works repeatedly select a "cut" position in the current reasoning trace uniformly at random and resample the suffix from that position onward. However, reasoning traces typically contain a few consequential decisions (e.g., the choice of proof strategy or algorithm), and we observe that a uniformly chosen cut tends to rewrite local details rather than revisit decision points. We introduce an algorithm (Entropy-Cut Metropolis-Hastings) that uses the base model's next-token entropy as a proxy to identify key decision points and resample from those positions. We empirically verify that entropy jumps are a useful proxy for decision points and, in a stylized model of reasoning, prove that our method's mixing time scales with the number of decisions in a trace rather than with the number of tokens, which can be much larger. Across MATH500, HumanEval, GPQA Diamond, and AIME26, our method consistently improves over baselines and RL-trained models.

large language model, machine learning, sampler, (20 more...)

arXiv.org Machine Learning

2605.30327

Country: Europe > Austria (0.28)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (0.88)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.68)

Add feedback

2280faacd674566a5eace1bd1098f507-Supplemental-Conference.pdf

Neural Information Processing SystemsApr-25-2026, 21:26:21 GMT

artificial intelligence, game theory, machine learning, (17 more...)

Neural Information Processing Systems

Industry: Leisure & Entertainment > Games (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (0.93)
Information Technology > Game Theory (0.93)

Add feedback

Block-Coordinate Methods and Restarting for Solving Extensive-Form Games

Neural Information Processing SystemsApr-25-2026, 21:26:16 GMT

Coordinate descent methods are popular in machine learning and optimization for their simple sparse updates and excellent practical performance. In the context of large-scale sequential game solving, these same properties would be attractive, but until now no such methods were known, because the strategy spaces do not satisfy the typical separable block structure exploited by such methods. We present the first cyclic coordinate-descent-like method for the polytope of sequence-form strategies, which form the strategy spaces for the players in an extensive-form game (EFG). Our method exploits the recursive structure of the proximal update induced by what are known as dilated regularizers, in order to allow for a pseudo block-wise update. We show that our method enjoys a O(1/T)convergence rate to a two-player zero-sum Nash equilibrium, while avoiding the worst-case polynomial scaling with the number of blocks common to cyclic methods. We empirically show that our algorithm usually performs better than other state-of-the-art first-order methods (i.e., mirror prox), and occasionally can even beat CFR+, a state-ofthe-art algorithm for numerical equilibrium computation in zero-sum EFGs. We then introduce a restarting heuristic for EFG solving. We show empirically that restarting can lead to speedups, sometimes huge, both for our cyclic method, as well as for existing methods such as mirror prox and predictive CFR+.

artificial intelligence, decision point, machine learning, (19 more...)

Neural Information Processing Systems

Country: North America > United States > Wisconsin (0.28)

Industry: Leisure & Entertainment > Games (1.00)

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Efficient Φ-Regret Minimization with Low-Degree Swap Deviations in Extensive-Form Games

Neural Information Processing SystemsFeb-18-2026, 11:04:45 GMT

In this paper, we develop efficient parameterized algorithms for regimes between these two extremes.

algorithm, artificial intelligence, machine learning, (20 more...)

Neural Information Processing Systems

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
North America > United States > Virginia (0.04)
(3 more...)

Genre: Research Report > Experimental Study (1.00)

Industry: Leisure & Entertainment > Games (0.67)

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

a120382cf4e2e06d94d7ae7ac96fbe25-Paper-Conference.pdf

Neural Information Processing SystemsFeb-17-2026, 01:56:22 GMT

algorithm, artificial intelligence, machine learning, (20 more...)

Neural Information Processing Systems

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > Texas (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report > Experimental Study (0.93)

Industry: Leisure & Entertainment > Games (1.00)

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Data Science (0.67)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.45)

Add feedback

Don't Eliminate Cut: Exponential Separations in LLM-Based Theorem Proving

Sonoda, Sho, Akiyama, Shunta, Uezato, Yuya

arXiv.org Machine LearningFeb-12-2026

We develop a theoretical analysis of LLM-guided formal theorem proving in interactive proof assistants (e.g., Lean) by modeling tactic proposal as a stochastic policy in a finite-horizon deterministic MDP. To capture modern representation learning, we treat the state and action spaces as general compact metric spaces and assume Lipschitz policies. To explain the gap between worst-case hardness and empirical success, we introduce problem distributions generated by a reference policy $q$, including a latent-variable model in which proofs exhibit reusable cut/lemma/sketch structure represented by a proof DAG. Under a top-$k$ search protocol and Tsybakov-type margin conditions, we derive lower bounds on finite-horizon success probability that decompose into search and learning terms, with learning controlled by sequential Rademacher/covering complexity. Our main separation result shows that when cut elimination expands a DAG of depth $D$ into a cut-free tree of size $Ω(Λ^D)$ while the cut-aware hierarchical process has size $O(λ^D)$ with $λ\llΛ$, a flat (cut-free) learner provably requires exponentially more data than a cut-aware hierarchical learner. This provides a principled justification for subgoal decomposition in recent agentic theorem provers.

artificial intelligence, logic & formal reasoning, machine learning, (21 more...)

arXiv.org Machine Learning

2602.10512

Country:

North America > United States > New York (0.04)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
(2 more...)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Logic & Formal Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (1.00)

Add feedback

2280faacd674566a5eace1bd1098f507-Supplemental-Conference.pdf

Neural Information Processing SystemsFeb-8-2026, 22:27:56 GMT

block construction strategy, ecyclicpda, gradient computation, (14 more...)

Neural Information Processing Systems

Country: North America > United States > Texas (0.04)

Industry: Leisure & Entertainment > Games (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (0.93)
Information Technology > Game Theory (0.93)

Add feedback

Block-Coordinate Methods and Restarting for Solving Extensive-Form Games

Neural Information Processing SystemsFeb-8-2026, 22:27:52 GMT

A common approach for solving BSPPs is by using first-order methods, where local gradient information is used to iteratively improve the solution in order to converge to an equilibrium asymptotically.

artificial intelligence, block construction strategy, machine learning, (17 more...)

Neural Information Processing Systems

Country:

North America > United States > Wisconsin > Dane County > Madison (0.14)
North America > United States > Texas (0.04)

Industry: Leisure & Entertainment > Games (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (0.93)
Information Technology > Game Theory (0.93)

Add feedback

5763abe87ed1938799203fb6e8650025-AuthorFeedback.pdf

Neural Information Processing SystemsFeb-8-2026, 12:15:20 GMT

algorithm, external regret minimizer, regret minimizer, (11 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence (0.30)

Add feedback

No-Regret Learning Dynamics for Extensive-Form Correlated Equilibrium

Neural Information Processing SystemsDec-24-2025, 01:47:07 GMT

The existence of simple, uncoupled no-regret dynamics that converge to correlated equilibria in normal-form games is a celebrated result in the theory of multi-agent systems. Specifically, it has been known for more than 20 years that when all players seek to minimize their internal regret in a repeated normal-form game, the empirical frequency of play converges to a normal-form correlated equilibrium. Extensive-form (that is, tree-form) games generalize normal-form games by modeling both sequential and simultaneous moves, as well as private information. Because of the sequential nature and presence of partial information in the game, extensive-form correlation has significantly different properties than the normal-form counterpart, many of which are still open research directions. Extensive-form correlated equilibrium (EFCE) has been proposed as the natural extensive-form counterpart to normal-form correlated equilibrium.

extensive-form correlated equilibrium, no-regret learning dynamic, normal-form game, (10 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)

Add feedback

Filters

Collaborating Authors

decision point

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

Reasoning with Sampling: Cutting at Decision Points

2280faacd674566a5eace1bd1098f507-Supplemental-Conference.pdf

Block-Coordinate Methods and Restarting for Solving Extensive-Form Games

Efficient Φ-Regret Minimization with Low-Degree Swap Deviations in Extensive-Form Games

a120382cf4e2e06d94d7ae7ac96fbe25-Paper-Conference.pdf

Don't Eliminate Cut: Exponential Separations in LLM-Based Theorem Proving

2280faacd674566a5eace1bd1098f507-Supplemental-Conference.pdf

Block-Coordinate Methods and Restarting for Solving Extensive-Form Games

5763abe87ed1938799203fb6e8650025-AuthorFeedback.pdf

No-Regret Learning Dynamics for Extensive-Form Correlated Equilibrium